Clamp exponential histogram percentiles to min/max #135632

JonasKunz · 2025-09-29T14:26:21Z

Before this PR, the exponential histogram percentiles algorithm would return the midpoint of the bucket for percentile queries.

With the exact min and max known and stored with the histogram, this can lead to the confusing case where e.g. a p99 is estimated to be larger than the max of the histogram.

This PR avoids this problem by clamping estimated percentiles to the [min, max] range.
If the histogram was constructed without the exact min and max being known, those are estimated using the lower/upper bound of the outermost buckets. Therefore in that case, clamping will not have an effect on the percentile computation.

elasticsearchmachine · 2025-09-29T14:27:19Z

Pinging @elastic/es-storage-engine (Team:StorageEngine)

kkrik-es · 2025-10-01T12:57:16Z

...ogram/src/main/java/org/elasticsearch/exponentialhistogram/ExponentialHistogramQuantile.java

            result = values.valueAtPreviousRank() * (1 - upperFactor) + values.valueAtRank() * upperFactor;
        }
-        return removeNegativeZero(result);
+        return removeNegativeZero(Math.clamp(result, histo.min(), histo.max()));


Hm percentiles shouldn't return values outside min and max.. Should we be doing interpolation between min/max and the borderline ranks instead?

You should probably have hardcoded logic for p0 and p100, now that you have min and max.

To rephrase what this PR tries to solve:

Lets assume in our histogram the highest, populated bucket is [1,2].

The percentile algorithm assumes that all values which fell into the bucket have the value 1.3333 (point of least relative error). Therefore if a percentile falling into that bucket is requested, 1.3333 would be returned

However, if the max of the histogram is actually 1.1, we know that 1.333 is incorrect. The [1,2] bucket was populated with values in the range [1, 1.1].

So what this PR does is that if the percentile falls into the highest (or lowest) bucket, it adjusts the assumed value for that bucket to move inside of min and max respectively.
If the percentile we are estimating does not lie in the outermost buckets (the ones containing min and max), the clamping has no effect: The estimated bucket center is bigger than min and smaller than max anyway.

Therefore I don't understand what (a) the interpolation you are suggesting would do and (b) why we should have a hardcoded logic for p0 and p100, as those are covered by the existing logic correctly.

Let's see. If max is 1.1, i.e. less than the polre :P, the polre should not be used for the highest bucket. Instead, we should be interpolating between the polre of the second-highest bucket and the max value. Using the polre for the highest bucket is provably inaccurate, in this case.

I think I understand what you mean now:
You are referring to the case where the percentile lies between the second highest and the highest bucket, and therefore is interpolated, right?

That means that it is better to clamp the ValueAndPreviousValue values before we do the interpolation, correct?

Right, wanna give it a try and add some tests to see what you get?

Fixed in 95166d9, which also adds a test which failed with the previous behaviour.

...al-histogram/src/test/java/org/elasticsearch/exponentialhistogram/QuantileAccuracyTests.java

Clamp exponential histogram percentiles to min/max

aa22fd2

JonasKunz self-assigned this Sep 29, 2025

JonasKunz added :StorageEngine/Mapping The storage related side of mappings >non-issue labels Sep 29, 2025

elasticsearchmachine added Team:StorageEngine external-contributor Pull request authored by a developer outside the Elasticsearch team v9.2.0 labels Sep 29, 2025

JonasKunz requested review from felixbarny and kkrik-es September 29, 2025 14:38

[CI] Auto commit changes from spotless

2239181

felixbarny approved these changes Sep 29, 2025

View reviewed changes

Merge branch 'main' into clamp-exp-histo-percentiles

afb29d6

kkrik-es reviewed Oct 1, 2025

View reviewed changes

JonasKunz and others added 6 commits October 1, 2025 16:07

Clamp before interpolation

95166d9

Fix comment

ab952fd

Fix test name

e75933f

Fix testPercentilesClampedToMinMax test

9a0fe02

[CI] Auto commit changes from spotless

790e687

Merge branch 'main' into clamp-exp-histo-percentiles

cfc971a

kkrik-es reviewed Oct 1, 2025

View reviewed changes

...al-histogram/src/test/java/org/elasticsearch/exponentialhistogram/QuantileAccuracyTests.java Show resolved Hide resolved

kkrik-es approved these changes Oct 1, 2025

View reviewed changes

JonasKunz added 2 commits October 2, 2025 08:50

Add test for min interpolation

02ee881

Merge branch 'main' into clamp-exp-histo-percentiles

6b98807

JonasKunz enabled auto-merge (squash) October 2, 2025 07:03

elasticsearchmachine added v9.3.0 and removed v9.2.0 labels Oct 2, 2025

JonasKunz merged commit 679c407 into elastic:main Oct 2, 2025
32 of 34 checks passed

JonasKunz deleted the clamp-exp-histo-percentiles branch October 2, 2025 09:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clamp exponential histogram percentiles to min/max #135632

Clamp exponential histogram percentiles to min/max #135632

Uh oh!

JonasKunz commented Sep 29, 2025

Uh oh!

elasticsearchmachine commented Sep 29, 2025

Uh oh!

kkrik-es Oct 1, 2025

Uh oh!

kkrik-es Oct 1, 2025

Uh oh!

JonasKunz Oct 1, 2025 •

edited

Loading

Uh oh!

kkrik-es Oct 1, 2025

Uh oh!

JonasKunz Oct 1, 2025

Uh oh!

kkrik-es Oct 1, 2025

Uh oh!

JonasKunz Oct 1, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Clamp exponential histogram percentiles to min/max #135632

Clamp exponential histogram percentiles to min/max #135632

Uh oh!

Conversation

JonasKunz commented Sep 29, 2025

Uh oh!

elasticsearchmachine commented Sep 29, 2025

Uh oh!

kkrik-es Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

kkrik-es Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

JonasKunz Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kkrik-es Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

JonasKunz Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

kkrik-es Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

JonasKunz Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

JonasKunz Oct 1, 2025 •

edited

Loading